Search CORE

31 research outputs found

Linguistic calibration through metacognition: aligning dialogue agent responses with expected correctness

Author: Boureau Y-Lan
Dinan Emily
Mielke Sabrina J.
Szlam Arthur
Publication venue
Publication date: 29/12/2020
Field of study

Open-domain dialogue agents have vastly improved, but still confidently hallucinate knowledge or express doubt when asked straightforward questions. In this work, we analyze whether state-of-the-art chit-chat models can express metacognition capabilities through their responses: does a verbalized expression of doubt (or confidence) match the likelihood that the model's answer is incorrect (or correct)? We find that these models are poorly calibrated in this sense, yet we show that the representations within the models can be used to accurately predict likelihood of correctness. By incorporating these correctness predictions into the training of a controllable generation model, we obtain a dialogue agent with greatly improved linguistic calibration

arXiv.org e-Print Archive

The SIGMORPHON 2019 Shared Task: Morphological Analysis in Context and Cross-Lingual Transfer for Inflection

Author: Cotterell Ryan
Heinz Jeffrey
Hulden Mans
Kirov Christo
Malaviya Chaitanya
McCarthy Arya D.
Mielke Sabrina J.
Nicolai Garrett
Silfverberg Miikka
Vylomova Ekaterina
Wolf-Sonkin Lawrence
Wu Shijie
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

The SIGMORPHON 2019 shared task on cross-lingual transfer and contextual analysis in morphology examined transfer learning of inflection between 100 language pairs, as well as contextual lemmatization and morphosyntactic description in 66 languages. The first task evolves past years' inflection tasks by examining transfer of morphological inflection knowledge from a high-resource language to a low-resource language. This year also presents a new second challenge on lemmatization and morphological feature analysis in context. All submissions featured a neural component and built on either this year's strong baselines or highly ranked systems from previous years' shared tasks. Every participating team improved in accuracy over the baselines for the inflection task (though not Levenshtein distance), and every team in the contextual analysis task improved on both state-of-the-art neural and non-neural baselines.Comment: Presented at SIGMORPHON 201

arXiv.org e-Print Archive

Crossref